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Machine Identification of Words 

Jf most of the letters of a siiaple substitution cryptogram 
have been determined there may be several possibilities for those words 
which contain unidentified letters* For exan^lSf we may have the word 
N E • T , with substitutes found for all the letters except S and W* The 
correct choice will usually present no problems to the human cryptanalyst^ 
but would be rather difficult for a machine* The following suggestion is 
proposed as a first faltering step in the direction of doing this by ma- 
chine* 

Assume each word stored in the machine has associated with it 
a symbol showing the category or field in which it is likely to occur* 
Most of the words will have a general symbol meaning that they could oc- 
cur in almost any field* ^en a cryptogram has been solved as far as pos- 
sible^ it is broken up into word lengths by the method described in the 
attached 'Machine Separation of Words ' * Then the text is scanned for in- 
co]ig>lete words and these fj^agments are compared with the words in the 
machine's dictionary for possible words these fragments may represent 
(taking into account the fact that all missing letters must be found 
among the set of letters still unidentified)* If only one such word can 
be fomd it is printed out at this point and the new letters so found 
added in other places in the message if they occur* If two or more words 
are found idiich might fit at this point, a freq^iency count of the category 
symbols of the other words of the message is taken and conQ)ared with ihe 
category symbols of the candidate words* The word with the symbol having 
the highest frequency (except the general symbol) is taken as the text 
word* 
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As an exaxi^le, suppose we have the following message : 
TO COMMANDER SECOND ARMT ENEMY PATROLS 
HAVE PENETRATED OUR RIGHT .LANK TO A 
TWO MILE DEPTH 

The word • L A N K , with J, X and Z stiU unidentified might be 

BLANK (Gen*) or F L A N K (Ml.), On counting the categosry symbols, 
the machine finds U words from the MlitazT- categozy and 12 words from the 
General category. It therefoare selects FLAN K as the correct word* 

This is an extremely simple iUustration* The number 
of categories may have to be increased greatly, the words in the General 
category may have to be arranged in several ways, provision may have to be 
made for using pairs of consecutive words or groups of longer length, etc. 
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Machine Separation of Words 



Assume a machine has stored in it a dictionary of the most 
common words in a particular type of traffic, and a set of instructions 
as follows : Take the first letter of the message and look through the 
dictionary to see whether there is a word beginning with this letter. If 
there is, proceed to the second letter. If there is a word in the dic- 
tionary beginning with these two letters proceed to the thizxi letter, etc. 
Also note at every point whether the letters up to this point represent a 
cotoplete word. If there is a complete word in the dictionary corresponding 
to the first r letters and no word begixming wH3i letters 1 to r -I- 1, print 
out the woi?d represented ly letters 1 to r and repeat the same process be- 
ginning with the (r+ l)st letter. If there are words beginning with let- 
ters k to kHh r (but not a conplete word), and no word beginning with let- 
ters k to k + r +• 1, go back to the last conplete word idiich has not been 
used and repeat the process. 

As an exanQ>le suppose we have the message ATANKER7L 



TINGTHEFINNISHFLAGWAS5IGHTEDATZER0SEV 



ENHUN BRED . let the symbol x mean ’no word beginningj^^ t ^e ^ ^ 

letters is in the dictionary, go back to last complete wo^,^ . mean 'a 

word beginning with these letters is in ihe dictionary, proceed to next 
\ 

letter, o mean ’this is a complete word, proceed to next letter'. The 
various steps in dividing this message into word l»igths would then ap- 



pear as follows : 









/ 
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1 


A 


0 


2 


AT 


0 


3 


ATA 


X 


U 


AT A 


o 


5 


AT AN 


o 


6 


AT ANK 


X 


7 


AT AN K 


. 


8 


AT AN HE 


. 


9 


AT AN KER 


X 


10 


AT A N 


. 


U 


AT A NK 


X 


12 


A T 


. 


13 


A TA 


• 


lU 


A TAN 


o 


15 


A TANK 


o 


16 


A TANKE 


. 


17 


A TANKER 


o 


18 


A TANKERP 


X 


19 


A TANKER F 


. 


20 


A TANKER PL 


. 


21 


A TANKER PLT 


o 


22 


A TANKER PLTI 


. 


23 


A TANKER PLIIN 


. 


2U 


A TANKER FLUNG 


0 


25 


A TANKER FLTINGT 


X 


26 


A TANKER FLYING T 


• 



The iiTimber of steps could be sli^tly reduced by using 
some miscellaneous instructions such as requiring that if a complete word 
at any point is AN, the next word must begin with a vowel. In the exanqple 
above, steps 8 and 9 would be eliminated bs" this instruction. However it 
would probably not be worth idiile using any instructions except those which 
refer to words in the dictionary. 

There will occasionally be a word in the message vdiich 
is not in the dictionary. In that case the machine would have to be in- 
structed to go back in turn to each one of the complete words idiich had 
not been used and begin again from there. If no sequence of words Can be 
found by regrouping the letters up to this point, the machine would print 
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oat the words up to the inadmissible word and the neKt letter separately 
and then px^eed as though this were the beginning of a new liiwssage* For 
exan^le^ suppose the wox^ Finnish in the message given were not in the 
dictionary. The machine steps would then appear as on the sheet attached* 
At step ItU all the complete words have been tried without success* There- 
fore A TANKER FLYING THE F is printed out and the remainder treated lilce 
a new message* The same sittiation arises at step U8* Here, however, the 
machine can go back only to step In its search for conplete words. The 
(aiSsr possibility is IN at step U6* The machine tries IN N, IN NI and IN NIS 
At this point there id no word in the dictionary beginning with NIS„and no 
con^lete word still untried* Therefore the machine prints out IN N and goes 
on to I, IS and ISH* Eventually it gets to F, FL, FLA, FLAG, etc* and from 
here on it is clear sailing* 

The final version is A TANKER FLYING THE P IN NIS 
H FIAG ms SIGHTED AT ZEROS EVM HUNDRED. 

Note that the machine would print out ZEROS EVEN 
instead of ZERO SEVEN, since it will always take the longest possible word* 
The machine might be instructed to scan the finished product for all com- 
plete words which have not been used* These may be brok^ up in different 
ways and scores for pairs of consecutive words calculated* If, for example, 
the final version contained the three consecutive words NXLL BEAT ZERO *.* 
the machine would already have noted that EE is a con^lete word in the 
dictionary, and on breaking BEAT at BE, the machine would also note that 
AT is another complete word in the dictionary* It would then compare 
NILL BEAT ZERO with WTIL BE AT ZERO. X'ttLL BE will score higher than TOLL 
BEAT and AT ZERO will score higher than BK4T ZERO. Therefore the text 
will be changed to NILL BE AT ZERO ••* 
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MEMOBANDUM 

SUBJECT: Si;[ggested Means for Increased Automation In Crjrptanalysls . 

1. GEHERAL ‘ 



In normal written alphabetic language there is _^a spatial relation- 
ship between and among contiguovis letters, particularly as regards the 



vowels and consonants, which enables the mind to recognize almost instantan- 



eously upon the receipt of visual impulses from the eyes a feature of such 
a language which we call pronounceabillty . It is_ iay belief that^the pro- 
nounceability phenomenon is what the mind apprehends or. recognizes f ir%t 



of all and long before the succession or constellations ‘ of letters forming 

* — T - 

words are recognized as words and the latter become intelligible. What 
we need to know more about is this phenomenon of pronounceabillty and how 
the mind or brain apprehends it. The phenonffinon is probably basically 
electrical in nature and the problem then is to design a machine that will 
simulate whatever electrical process takes place in the brain, a process 
which perhaps corresponds to the phenomenon in question. This, I believe, 
would not be too difficult and I conceive of a complex of components which 
would be of the natiire described and would operate as a system in the 
manner described below: 



2. BROCEDUKE 

a. Assume we are dealing with a cryptographic version of a text 
of 100 or more letters of good normal English which has been enciphered 
monoalphabet ically by a random-mixed alphabet. Assume a machine of the 
digital computer type, into which impulses (in a binaiy code) corresponding 
to the succession of the letters or characters of the cipher text are fed, 
by the usual means (punched cards, perforated paper or magnetic tape, or the 
like) as the input. 
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b. The first step for the machine, on acceptance of the input, 
would be to program its functioning, according to the usual methods, to 
conduct the impulses to and store them in the memory, making at the 
same time and storing a unilateral frequency count of the characters of 
the cipher text. The machine then determines the 10 cipher letters most 
frequently represented and sets them up in a descending order of frequency. 

c. The machine's next operation is to set up 10 I permutations of 
equivalents, beginning with the permutation corresponding to the normal 
English frequency expectancy series E, T, 0, A, I,N, R, S, H, D. The 
substitution equivalents of the first of these 101 permutations are then 
applied by the machine to the appropriate letters of the cijiier text and 
at the same time impulses which will trigger off signals corresponding to 

and capable of making soixnd spectographic representations of the substitutioiaal 
plain-text equivalents are set up within the machine and temporarily impressed 
upon a medium capable of being moved laterally (film, magnetic tape, etc.) 
past a sound-spectograph reading component. 

d. The sound-producing component of the machine, actuated by the 
reading component, "pronounces" the sequence of spectographic representations 

— or, at least, it attempts to do so as best It can. The machine, furthermore. 



is set so that there is a lower threshold- of "pronounpeability, " which unless 
reached and passed, will cause a "stuttering" or some phenomenon equivalent to 
a "lingual impediment." Vfhen the machine finds_,the impediment beyond the 
threshold or critical value of pronounceability — in other words, when what it 
tries to enunciate is "unpronounceable" in English it ttoows out the permutation 

' , r : 

-jv- 

selected and begins to repeat its sequence of bperdtion| on. the next, of the 10; 

- It " i 1 *■ 

permutations . » : ■ 1 1 • r * ‘ 



m 
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e. The machine should he ahle In this way to eliminate a great many of 
the 101 permutations at a very rapid rate, retaining only a few which surpass 
the critical threshold of "pronounceability" . These remaining possibilities 
will have to be examined visually or ’’listened to" by the operator. 

f , Assuming the correct permutation of the 101 high frequency 
letters has been isolated the analyst will certainly be able to fill in the 
remaining letters without too much difficulty. 
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0CflMQECSt Suggattted Msaiu for XncxtMted Autcnatlon In CzyptomlyBis* 



1. QSSmKi* 



>a*4.aa-6 
aavTOg 01 




C di«0UAs 

I, 

5y\,ct^Stt4<0^CV 






:i£(U0Uft loinarsi particularly 

iV^vovela on^coxicona^ |i|juMi «nai>lee tlxe mind to rccogniao^lisoav instantan- 
aeuiily Tf aii tho receipt- of -viiaxgj. itqsulaSr.fcaaa the "q y < » *> o r ^ a ti M » ea c h - 




a^vhloh ve call nroi|iounee^lllty . Xt is ay heXlef that th« pro- 
nounceabillty phenomenon la vhat the adnd apprehende or raeognlees first 
of all and long hefore the succession or ccosteUatioas of letters fomine 
vords are recognised as vords aiad the latter heccaqe Intelligihle* \lhat 
ve need to know more about is this phencaaenon of pronounceability and hov 

I 

the mind or brain ax^ehends it* Xho phenooenon is probably basically 
electrical in nature and the problem then Is to design a aachine that vill 

simulate vhatever electrical process takes place In the brain, a process 

/ 

-which perhaps corresponds to the phenoaenon in qiuestlon* Ibis, X believe, 
would xiot be too difficult and X conceive of a ocnsplOK of ccaaponents which 
would be of -the nature described and would operate as a sys-bem in the 
manner described beXbvi 

ft. mocmm 

/ I 

a. Assuma ve d eal i n g vl-th a cryptographic version of a -text 

I 

/ 

of 100 or aiore let-bers of good normal Snglish which has been enciphered 
rKwoalphabetleall^ by .a random-taixed alphabet* Assume a machine of the 
digital computer^' tyi^', into which impulses (in a binary code) corresponding 
to the succession ojf the letters or characters of the cipher text are fed, 

i 

_J ‘ 

by the usual means (punched cards, perforated paper or megnetie tape, or the 
Xiisc) as the input* 




r 

I 

I — 

I 

t -r 

i 



4 : 



V, 



I ( 



REF ID:A60419 



b* lihA firiit step for tho xgachiott^ on *ce«pt8ace of tho Inpati 
would be to program ita funetionlag^ meccurdirig to the usual aatboda, to 
conduct the lngpulsea to and atore them in the asuory, mklng at the 
aaoae time and atoring a imllateral frequency count of the c^iaractera of 
the cipher text* The toachine then detexminea the 10 cipher letters most 
frequently represented and aeta them up in a descending order of frequency* 
c« The machine ’a next operation ia to set up 10!-permutationa of 
equivalentaj begixmlng with the permutation corresponding to the ncnml 
Bngllsh frequency expectancy series K, T, 0, A, X,Jl, R, 0, H, ». The 
avfbatitution equivalents of the first of these 10! permutatlans are then 
applied by the machine to the appropriate letters of the cipher text and 
at the same time impulses which vlU trigger off signals corresponding to 
and capable of making sound spectographlc representations of the aihstitutionaX 
plaln-text equivalenta are set up within the machine and temporarily iaqxressed 
upon a medium capable of being moved laterally (film» aegnetlc tape^ etc*) 




past a sound~apectograph reading component* 

I 

d* The sound-producing ccmqponent of the machine, actuated by the 
reading component, "prououncea" the sequence of spectographlc repreaentations \ 

V- , 

— or, at least, it attempts to do so as beat It can* The machine, furthermore,^ 

. 

la set so that there ia a lover threshold of "pronounceablllty," which unless , \ ; 

j \ 

raached and passed, will cause a '^stuttering'* or some phenomenon equivalent to \ 
a ''lingual impediment*" ilhen the machine finds the impediment beyond the [Vv 

threshold or critic^ value of pronounceabllity — in other words, when vt^t it ' 

trios to enunciate Us "unpronounceable” in English — it throws out the permutation 



selected and begi^ to repeat its sequence of operations on the next of the 101 



I I 
! 
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She xaachlne should be lible In this nmy to eliminate a gn&t nsny of 
the 101 pemutations at a very rapid rate» retaining only a tmr vhich surpeuBS 
the critical threshold of "pranounceabillty'* . Ihese remaining possibilities 
vill have to be examined visually or "listened to" by the operator> 

f * dssuming the correct permutation of the 10! high frequency 
letters has been isolated the analyst vill certainly be able to fill in the 
remaining letters vlthout too much difficulty* 
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SMDMAGDVVMPXRDCIOBPDDCVQXRIGV 

LEFMOFYLFIMVPTESTLEDPVDIOLEFG 

MTFMTXDMROVFCSCLEPCYIMSMAIQGD 



ABCDEFGHIJKLMNOPQRST 
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!• A firequency count of letters and digra^dis was loade as 

well as a chart showing the letters preceding and following each of the 
cipher letters. From these a table of reversed digra;E^s was made. 

2* The low frequency letters were marked as consonants. The 

letters occuiulng only once or twice account for only $ of the 101 let- 
ters of the message. The letters occurring 3 times were therefore added. 
This broxight the frequency to 11, which was still below the 20 $ threshold. 
However if letters occurring U times are included, the total frequency 
would be 23. There are tests ^ich mi^t enable us to decide which of the 
U-frequency letters to include and which to exclude, but this might be 
difficult to build into a machine. 

3. A table was made of the letters preceding and following 

each of the assumed consonants A, B, Q, Z, B and T. All the letters ex- 
cept C, E and Y appeared. Since letters which never contact low-frequency 
consonants are very likely consonants th^nselves, these three were added 
to the assumed ccnsonants. The contact table then appeared as follows : 



A B Q 2 B T 


C E Y 


M M M M M 


G G 


D D D 


D D 
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S S S 
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I I I I 
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L L L L L 


L L 


F F 


F F F F 


P P 


P P 



Y Y 
C 



E 
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M, D, S, I, L, F and P seem to be the best prospects for vowels* The 
substitutes for k, E, I and 0 should be foimd in this group. 

The reversed digraphs were studied with a view to 
identi^^g vowels. Since most reversed digraphs will be of the foim 
c-v or v-Cj it should be possible to separate most of the letters which 
reverse into two classes^ these being identified as consonant or vowel 
by reference to (3)* However since word divisions are not shown these 
indications may be somewhat blurred because the first letter of one word 
may reverse xri.th the last letter of the previous word* As a start only 
digraphs idiich reversed more than once were considered* 

SM3 TF2 DM2 VF2 FL2 

MSI FTl MDl FVl LFl 

These can be separated into two classes in two ways, (a) MF-DSV7L 
and (b) MVYL-DSF* From (3), V and T are not ve3Py good prospects 
for vowels? therefore D, S, V, 7 and L in (a) probably represent the cot- 
scnants* This would make M, I, F and P the only good prospects to rep- 
resent kg E, I and 0* However we have the following contacts among these 
four letters : M P 1, I M 2, F M 2, F I 1. In (b), M, V, 7 and L 
probably represent the consonants* This would leave A, E, I and 0 to be 
found among D, S, F, I and P* Thei« are only the following contacts 
among these five letters : D I 1, D P 1, F I 1, P D 1* Since one of 
these five letters would be eliminated, the number of these presumed 
v-v contacts would probably be reduced further. All in all (b) seems 
like the more pz*obable set-up* 

An atten^t was made to id^tify the assumed vowels, 

D, S, F and the assumed consonants, M, V, 7, L • D, the hipest fre- 
quency letter^ which also appeared doubled seemed almost certain to be 
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E* F and S were asstiined to be the two next most frequent vowels, A and 
0 respectively* As for the consonants, M, V, T, L,* V and L i^ich ogcur 
doubled seemed Hke good prospects for T and S respectively* M, the 
highest frequency consonant is probably N, H or R* From the first four 
letters of the message, SMDM, H = M seemed by far the best choice. 
The worksheet now appeared as in Fig* 1. 



6. Changes were made in the plain text letter assumptions 

in order to make the A?agments of text recovered look more like English* 
ONEN ETTN 



SMDMAGD7VM 



did not seem vezy likely. If we interchange the 



values of V and L, S, L = T yields SMDMAGDVVM 

seems to be an in^rovement, although there is probably something still 
ATT E ON 

wrong. suggests iSa&t G is H, R or L, G = R does not 

look ve3?y good in G = H does not look veay good in 

IT u F li L Q IJ 

ONEN HESSN „_t4«ALATTLE __ 

SMDMAGDVVM* G-I*3n pgpllgd suggests P — I. The 



worksheet then looked as in Fig* 2* 



7. 



The solution was completed using these recoveries. 
NEN*LESS is obviously AN/eNDLESS . 



